Exploration-Exploitation Strategies in Deep Q-Networks Applied to Route-Finding Problems

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Exploration through Bayesian Deep Q-Networks

We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration in high dimensions through posterior sampling but is usually computationally expensive. We address this limitation by introducing uncertainty only at the output layer of the network through a Bayesian Linear Regression (BLR) mode...

متن کامل

Regular Algebra Applied to Path - finding Problems

In an earlier paper, one of the authors presented an algebra for formulating and solving extremal path problems. There are striking similarities between that algebra and the algebra of regular languages, which lead one to consider whether the previous results can be generalized—for instance to path enumeration problems—and whether the algebra of regular languages can itself be profitably used f...

متن کامل

Deep Abstract Q-Networks

We examine the problem of learning and planning on highdimensional domains with long horizons and sparse rewards. Recent approaches have shown great successes in many Atari 2600 domains. However, domains with long horizons and sparse rewards, such as Montezuma’s Revenge and Venture, remain challenging for existing methods. Methods using abstraction (Dietterich 2000; Sutton, Precup, and Singh 19...

متن کامل

Human and Optimal Exploration and Exploitation in Bandit Problems

We consider a class of bandit problems in which a decision-maker must choose between a set of alternativeseach of which has a fixed but unknown rate of rewardto maximize their total number of rewards over a short sequence of trials. Solving these problems requires balancing the need to search for highly-rewarding alternatives with the need to capitalize on those alternatives already known to be...

متن کامل

The Exploration-Exploitation Tradeoff in Sequential Decision Making Problems

Sequential decision making problems often require an agent to act in an environment where data is noisy or not fully observed. The agent will have to learn how different actions relate to different rewards, and must therefore balance the need to explore and exploit in an effective strategy. In this report, sequential decision making problems are considered through extensions of the multi-armed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Physics: Conference Series

سال: 2020

ISSN: 1742-6588,1742-6596

DOI: 10.1088/1742-6596/1684/1/012073